Bookmark

Agentic Misalignment: How LLMs could be insider threats

https://www.anthropic.com/research/agentic-misalignment, posted 9 Aug by peter in ai science toread

In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.

Agentic Misalignment: How LLMs could be insider threats

Hello,

More Sites and Experiments